Shareware Grab Bag

home *** CD-ROM | disk | FTP | other *** search

/ Shareware Grab Bag / Shareware Grab Bag.iso / 007 / a86masm.arc / A86-MASM

Wrap

Text File | 1987-05-16 | 33KB | 696 lines

A DISCUSSION OF A86'S ADVANTAGES OVER MICROSOFT'S MASM ASSEMBLER by Eric Isaacson In this paper, I'll describe in detail the superiority of my A86 assembler over the largest-selling assembler for the IBM-PC, the MASM assembler, V4.00, marketed by both IBM and Microsoft. I'll cover these topics: 1. Speed of assembly. 2. Code generation (yes, it's possible for an assembler to generate better code!) 3. Quality of error messages. 4. Library facilities. 5. Extra language features. Speed of Assembly Until some months after A86 was introduced, Microsoft advertised MASM V4.00 as "the fastest MS-DOS macro assembler, bar none." A86 completely shatters this claim, as the following data show. To provide a good profile of execution speed, I assembled four programs, each on four different computers. On each computer I assembled the programs twice; once in which the assembler and the programs are all on a RAM disk, and once in which the assembler and the programs are on the non-RAM storage medium for the computer-- a hard disk for two of the computers, a floppy disk for the other two. No attempt was made to insure that files appeared contiguously on the disks. For each assembly I measured four times-- the A86 V3.02 execution time, the MASM V4.00 execution time, the time it takes to copy the source program to the NUL device, and the total execution time for MASM, LINK, and EXE2BIN (necessary with MASM to produce the COM file that A86 produces all by itself). I then used the times to produce four ratios for each program run: 1. The ratio of MASM's execution time to A86's execution time. 2. The same ratio, with the copy-source times subtracted from each assembler's time. Subtracting the copy-source time provides a more accurate measure of actual assembly time, apart from the minimum time the assembler would have to wait for MSDOS to read the source program. 3. The ratio of MASM+LINK+EXE2BIN's execution time to A86's execution time. 4. The same ratio, with the copy-source times subtracted. The four programs that I used are as follows: HUGE is a program that consists of a few memory variable definitions, and many copies of the entire 286 instruction set. The hugeness is achieved by 20 repetitions of the same instruction-set file, given as INCLUDEs to MASM, and as invocation-file names to A86. Both MASM and A86 blissfully fail to notice that they are assembling the same file repeatedly, so that it is as if 20 different files are being assembled. The main file is 51 lines, 527 bytes, and the include file is 500 lines, 40000 bytes. So the total assembly is 10051 lines, 800527 bytes. COMM is the same program as HUGE, except that the first character of every line of the INCLUDE-file is replaced with a semi- colon. Comparing the times of HUGE and COMM give an indication of how much longer it takes the assemblers to process real instructions than it does to chew through comments. NORM is a real-life program, WS-ASCII by Michael Hoyt, which was written for MASM but which A86 assembles without modification. The program has 434 lines, 17272 bytes. TINY is a program consisting of a single NOP instruction, together with the minimum amount of MASM red-tape directives to produce a COM file. The program has 7 lines, 90 bytes. The four computers I used are as follows: AT is a 9MHz IBM PC-AT with 512K of RAM and a PRIAM hard disk. LE is a 4.77MHz Leading Edge Model "D" with 640K of RAM and a 30MHz Hard Disk. PC is a 4.77MHz IBM PC with 512K of RAM and a floppy drive. ZEN is an 8MHz Zenith 159 PC with 512K of RAM and a floppy drive. Table 1 shows the execution times (in seconds) gathered, and Table 2 shows the resulting ratios. The subjective impression is that A86 execution speed is limited only by the speed of the file input/output (in fact, one user reports that A86 assembles his source files faster than his editor saves them). The impression holds true in the case of the floppy-based Zenith computer-- A86 actually assembled the HUGE and COMM programs faster than COPY concatenated the source files! For all the other cases, however, the tables reveal a distinct measurable difference between the time it takes to assemble files and the time it takes to read the files with the COPY program. ------------------------------------------------------------------ TABLE 1. Execution Times for MASM vs. A86 for Four Programs Programs: HUGE COMM NORM TINY Computers: Hard AT MASM 86.33 30.67 3.51 1.48 Hard AT A86 14.94 9.66 0.99 0.88 Hard AT COPY 10.33 9.50 0.27 0.11 Hard AT MASM->COM 100.24 33.06 5.43 2.13 RAM AT MASM 55.62 13.14 1.98 0.44 RAM AT A86 6.23 2.30 0.44 0.27 RAM AT COPY 2.03 2.03 0.11 0.11 RAM AT MASM->COM 62.12 13.64 2.67 0.94 HARD LE MASM 314.18 77.46 11.29 2.68 HARD LE A86 31.16 12.56 2.50 1.43 HARD LE COPY 11.47 10.98 0.60 0.50 HARD LE MASM->COM 355.27 81.00 16.16 6.18 RAM LE MASM 249.14 58.99 8.46 1.59 RAM LE A86 24.88 9.17 1.53 0.88 RAM LE COPY 8.02 8.02 0.38 0.33 RAM LE MASM->COM 261.79 61.14 11.26 3.73 FLOP PC MASM 417.71 232.11 25.21 15.54 FLOP PC A86 104.14 87.66 11.54 7.25 FLOP PC COPY 78.05 79.26 3.51 1.38 FLOP PC MASM->COM 487.85 256.72 52.95 39.71 RAM PC MASM 249.58 59.32 8.90 2.03 RAM PC A86 25.26 9.67 1.81 1.15 RAM PC COPY 8.18 8.24 0.44 0.33 RAM PC MASM->COM 279.73 61.63 11.81 4.28 FLOP ZEN MASM 314.54 269.24 27.19 20.55 FLOP ZEN A86 118.58 114.74 13.73 10.16 FLOP ZEN COPY 122.71 122.98 4.62 2.53 FLOP ZEN MASM->COM 398.29 297.42 59.38 48.78 RAM ZEN MASM 140.88 35.65 5.05 0.88 RAM ZEN A86 14.83 5.33 0.88 0.44 RAM ZEN COPY 4.78 4.78 0.22 0.17 RAM ZEN MASM->COM 169.45 36.86 6.65 2.14 ------------------------------------------------------------------ As you might expect, the most consistent results are obtained when a RAM disk is used. The ratios don't differ very much from computer to computer. They reveal A86 to be an order of magnitude (ten times) faster than MASM at the simple assembly of large programs, about five times faster for the normal-sized program. In no case does A86 perform an anything less than half- again the speed of MASM. I have been claiming that A86 is five times as fast as MASM overall; and these results bear me out. ------------------------------------------------------------------ TABLE 2. Ratios of MASM Times to A86 Times Programs: HUGE COMM NORM TINY Computers: Hard AT ASM 5.8 3.2 3.5 1.7 Hard AT ASM-COPY 16.5 132.3 4.5 1.8 Hard AT COM 6.7 3.4 5.5 2.4 Hard AT COM-COPY 19.5 147.2 7.2 2.6 RAM AT ASM 8.9 5.7 4.5 1.6 RAM AT ASM-COPY 12.8 41.1 5.7 2.1 RAM AT COM 10.0 5.9 6.1 3.5 RAM AT COM-COPY 14.3 43.0 7.8 5.2 HARD LE ASM 10.1 6.2 4.5 1.9 HARD LE ASM-COPY 15.4 42.1 5.6 2.3 HARD LE COM 11.4 6.4 6.5 4.3 HARD LE COM-COPY 17.5 44.3 8.2 6.1 RAM LE ASM 10.0 6.4 5.5 1.8 RAM LE ASM-COPY 14.3 44.3 7.0 2.3 RAM LE COM 10.5 6.7 7.4 4.2 RAM LE COM-COPY 15.1 46.2 9.5 6.2 FLOP PC ASM 4.0 2.6 2.2 2.1 FLOP PC ASM-COPY 13.0 18.2 2.7 2.4 FLOP PC COM 4.7 2.9 4.6 5.5 FLOP PC COM-COPY 15.7 21.1 6.2 6.5 RAM PC ASM 9.9 6.1 4.9 1.8 RAM PC ASM-COPY 14.1 35.7 6.2 2.1 RAM PC COM 11.1 6.4 6.5 3.7 RAM PC COM-COPY 15.9 37.3 8.3 4.8 FLOP ZEN ASM 2.7 2.3 2.0 2.0 FLOP ZEN ASM-COPY infinity infinity 2.5 2.4 FLOP ZEN COM 3.4 2.6 4.3 4.8 FLOP ZEN COM-COPY infinity infinity 6.0 6.1 RAM ZEN ASM 9.5 6.7 5.7 2.0 RAM ZEN ASM-COPY 13.5 56.1 7.3 2.6 RAM ZEN COM 11.4 6.9 7.6 4.9 RAM ZEN COM-COPY 16.4 58.3 9.7 7.3 ------------------------------------------------------------------ Another claim I have made is that A86 assembles 1000 lines per second under the best conditions (8MHz AT with a RAM disk or contiguous files on a fast hard disk). This claim was based on observations of A86 assembling itself. I put a line-counter into A86, and used it to count the source lines. The counter treated macro-expansion lines as separate lines to be counted (which seems fair to me). Since the original measurement was made, A86's performance assembling itself has eroded, and my claim as appropriately restricted (to contiguous files on a fast hard disk). This more extensive and organized study illuminates my claim in several ways. There was more of a difference between a hard disk and a RAM disk than I realized. With a RAM disk, A86 assembles 1434 instruction lines a second; and over 4000 comment lines a second. A large, heavily-commented program should assemble at around 2000 lines per second. With a hard disk, the time for assembly of instructions slows to around 600 lines per second. By subtracting the time for execution of a COPY of the source files, we obtain a closer estimate of the time A86 spends assembling, rather than waiting for the input. Under this measure, A86 assembles instructions at 2127 lines per second; comments at well over 10000 lines per second (the time for assembling comments is so close to the time for copying comments that the measure yields wildly-differing, near-infinite rates of assembly). To summarize: A86 is outrageously fast, and MASM isn't. Ease of Use Assembly language is routinely condemned by computer literature. The primary thrust of most criticism is that programs are more difficult to write in assembly language than they are in high- level language. Intel aggravated the problem when it designed the 8086 assembly language, by forcing users to provide numerous directives in every program to describe the segmentation model being used, even when the program does not concern itself with segmentation. I call these directives "red-tape directives". MASM followed Intel's lead in requiring them; as a result, many programmers have been deterred from using (or even learning!) assembly language because of the intimidating nature of the red- tape directives. For compatibility, A86 recognizes the red-tape directives required by MASM. But A86 also has a set of defaults that make the red-tape directives unnecessary. If you wish to code a program consisting of 10 lines, your source file can consist of the 10 lines and nothing else. A86 will assemble the 10-line file directly to a COM file, ready to be executed immediately. With MASM, you must discern and code the extra red-tape directives. After assembling the file, you must feed the resulting OBJ file to LINK to obtain an executable EXE file. If you want the simpler, more compact COM format, you must feed the EXE file to a program called EXE2BIN. A particularly annoying feature of the process just described is a pair of conflicting requirements: to LINK with no errors, you must declare a STACK segment, but to get EXE2BIN to give you a COM file, there must be no STACK segment. You must resolve this conflict in favor of EXE2BIN by omitting the STACK segment, and have LINK always tell you that you had 1 error. A86 also has a good set of default settings for writing subroutines to be called by high-level language programs. For example, if your C program calls a function MUL10, that multiplies its single operand by 10, you can code MUL10 with A86 as follows: _MUL10: ; leading underscore required by compiler PUSH BP ; "C" expects BP to be preserved MOV BP,SP ; we use BP to address the stack MOV AX,[BP+4] ; fetch the number N, beyond BP and the return address ADD AX,AX ; 2N MOV BX,AX ; 2N is saved in BX ADD AX,AX ; 4N ADD AX,AX ; 8N ADD AX,BX ; 8N + 2N = 10N POP BP ; BP is restored RET ; go back to caller The above 11 lines can be the entire source file! If the file is named MUL10.8, then the command A86 MUL10.8 MUL10.OBJ will produce an OBJ file compatible with the standard SMALL model of computation (substitute RETF for RET in the file and you'll get the LARGE model). The OBJ file contains a PUBLIC symbol record for _MUL10; the code will be placed into a segment named _TEXT, with the appropriate combination-types. If there had been any references to undefined symbols in the file, those symbols would have automatically been declared external in the OBJ file. MASM places a barrier of confusing directives between the user and the program. A86 lets the user concentrate on the program, and provide the directives only when they are needed to do something unusual. Code Generation In an effort to make A86 as compatible as possible with MASM, I have assembled dozens of publicly-available source files. While doing this, I made a surprising discovery: A86 generates better code than MASM does! Many programs wind up a few bytes smaller under A86 than they do under MASM. Here's why: 1. I've noticed that the LEA (Load Effective Address) instruction is misused by many programmers. LEA is essentially a register arithmetic instruction; for example, LEA AX,[BX+SI+200] adds 200 into the contents of the BX and SI registers, and stores the 3-item sum in the AX register. No memory reference is made. Many programmers use LEA to load a simple offset into a register: they code LEA SI,BVAR instead of the more verbose MOV SI,OFFSET BVAR. However, the LEA instruction consumes one more byte of object code than the equivalent MOV instruction. When I saw this, I decided to make A86 generate the shorter MOV instruction when it sees an LEA of a simple memory variable, or of a single-register index with no displacement. 2. I already mentioned that LEA does not reference memory-- the memory address, not the memory contents, is loaded into the destination register. Thus, if there is a segment-override associated with the memory address, there is no point in providing the override opcode byte. MASM, however, will generate that opcode byte if the memory operand is not ASSUMEd to be in the default segment register for the operand. This override opcode is ignored by the processor when the program executes. A86 does not generate this wasted override. 3. If you code a MOV or arithmetic instruction of an immediate value into a forward-referenced memory variable, A86 forces you to specify the size of the variable by appending a B or W to the first reference. If the variable is byte-sized, you are rewarded for your extra effort: A86 will save a byte of code over MASM, which allocates enough space for a word in pass 1, then generates a wasted NOP instruction or a less- efficient opcode form in pass 2 when it sees that only a byte is needed. 4. If you code a JMP to a forward-referenced label, there is no good way for MASM to tell if a short (within 128 bytes) JMP will suffice, without your explicitly forcing it with the SHORT operator. A86 assumes that a JMP to a forward- referenced local label is short. Thus, if you don't pay close attention to this sort of thing, your program will be shorter with A86. (If it is a long jump, you can override with the LONG operator; or you can disable this feature entirely with the L switch.) 5. There is an obscure code optimization that can be performed in the following case: suppose you have a procedure that is to be called both from within and without its code segment. The procedure must return via RETF to handle far calls successfully. This means that the CS register must be pushed onto the stack even when the procedure is called from within its own segment. The straightforward way of doing this is via a FAR call, with the operand containing the same segment- register value as the caller. An optimization that saves both program space and execution time is to do a PUSH CS followed by a near CALL to the procedure. A86 performs this optimization when it sees the chance; MASM doesn't. 6. Finally, I can't resist mentioning that A86 produces much more efficient .OBJ files than MASM does. The efficiencies I refer to here won't necessarily show up in the final program; but they make .OBJ files smaller (saving disk space if you have libraries) and they will make linking and loading faster. Some examples: a. If your source file has interspersed fragments of different segments (e.g. code and data segments), A86 will link together the fragments of each segment before outputting their contents to the OBJ file. MASM will keep the output fragmented into separate object-contents records. b. If you have a sequence of PUBLIC symbols all from the same segment, A86 will output them all in one record. MASM always outputs a different record for each public symbol. Each wasted record increases the OBJ file size by 6 bytes. c. MASM always produces inefficient records for DUP constructs. The innermost part of any nested DUP, containing the actual data to be duplicated, will always be given by MASM as a nested DUP with a count of 1. For example, 4 DUP (5) is encoded by MASM as if it were 4 DUP (1 DUP (5)). A86 avoids such inefficiency. d. If you CALL a procedure within the same segment as the CALL instruction, the assembler should generate a relative count of the distance from the end of the CALL to the beginning of the procedure. This count can be determined at assembly time, even if the segment is relocated. But MASM doesn't bother; it generates a fix-up record in the OBJ file and lets the linker perform the subtraction. This wastes 11 bytes for the first CALL, and 7 bytes for each subsequent CALL within an object-contents record. Quality of Error Messages A86 makes error-correcting much easier by inserting messages directly into the source file, at the point where they occurred. There is also a less immediately obvious improvement in error reporting: the quality of the messages themselves. The content of error messages isn't too important for expert assembly-language programmers -- a general hint as to the nature of the problem is enough for the expert to solve it. But for novice programmers, good error messages can save hours of frustration. I've given a lot of thought to my error messages. In some cases, I've actually added code to A86 to diagnose errors after they have been detected, to provide a more descriptive message. As a result, A86 provides error-reporting superior to MASM's reporting. Some A86 error messages are spectacularly more descriptive than MASM's messages, because A86 has a much better idea of what went wrong than MASM does. Some examples: ROR AL MASM: Syntax error A86: More Operands Required MOV AX,(5*(17+56/(32+4)) MASM: Syntax error A86: Parenthesis/Bracket Mismatch MOV AL,FOO ...... FOO DW ? MASM, first line: Operand types must match MASM, second line: Phase error between passes A86, second line: Definition Conflicts With Forward Reference In other cases, it appears that MASM has just as good an idea of the problem; but A86's phrasing of the error message is clearer for novice programmers. Examples: MOV ES,0 MASM: No immediate mode A86: MOV Segment Register,Immediate Not Allowed INC DS:[BX] MASM: Operand must have size A86: Is It Byte Or Word? STC 17 MASM: Extra characters on line A86: Operands Not Allowed MOV AX,[BX+BP] MASM: Already have base register A86: [BX+BP] And [SI+DI] Not Allowed Library Facilities A common complaint against assembly language is its lack of built-in power. Most high-level languages have features for formatting, input/output, and/or numerical computation, that can be accessed with relative ease. To overcome this limitation you can build up a library of routines that perform these functions, and call procedures from the library. Both MASM and A86 allow you to build libraries of object files. In order to call a library subroutine using MASM, you must declare the subroutine at the top of every module that uses it, using the EXTRN directive. Then you must tell LINK where to find the library that contains your subroutine. A86 allows you to improve upon this scenario slightly, by not requiring the EXTRN directive in the calling module (or, for that matter, the PUBLIC directive in the library module). For programs written entirely in assembly language, a much more significant improvement is offered by A86's source library facility. You feed the library files to the A86LIB tool, and put a SET A86LIB to the drive and/or directory into your AUTOEXEC.BAT file. Once you've done that, the library is set up, and calling it is totally effortless and automatic. Whenever A86 sees any undefined symbols, it looks for them in the libraries in the current directory and in the A86LIB-environment directories. If you don't use the library. no time is wasted unless you mistakenly leave a symbol undefined. Also, since A86 assembles source files faster than LINK links object files, the whole process occurs much more quickly than if LINK is used. There is another advantage to source file libraries over object libraries-- a dimension of flexibility for symbols defined by the main program and used by the library. There are several manifestations of this: 1. A quantity such as an array limit could be an assembly-time constant in some programs, and a run-time variable in others. Since the 86 language defines the same syntax for immediate- operand instructions as for variable-operand instructions, the library can access the array limit without knowing whether it is constant or variable. Then the calling program can define the limit using EQU or DW as necessary. Since the immediate- versus-variable ambiguity exists at the source level and not the object-code level, an OBJ library must commit itself to the limit's type. 2. Similarly, a subroutine call might be direct in some programs and indirect in others. An example of this is the LINES library module given as the sample in the A86 package. The module gathers in standard input, and calls the routine PROCESS_LINE provided by the main program. For simple filter programs, PROCESS_LINE will be a subroutine directly called by the library. More complicated programs might perform different actions on a line, depending on the line's context within the input stream. For such programs, PROCESS_LINE can be a word- variable pointing to the routine wanted, and the call to PROCESS_LINE will be indirect. Again, an object-file library must commit to direct-versus-indirect. 3. The calling program could provide the definition for a macro used by the library routine, or the defintion of switches to control conditional assembly within the library. This adds an arbitrary degree of specialization to the library routine, not possible with object libraries. Extra Language Features A86's language extension features make A86 programs better and easier to code than MASM programs. The idea behind most of the features is to make the typical program either shorter or less cluttered. Let's consider each feature in turn: 1. LOCAL SYMBOLS. A86 allows symbols consisting of a single letter followed by one or more decimal digits (L3, X123, Y37, etc.) to be redefined within your program. This allows such symbols to have local scope. If you examine most assembler program symbol tables, you will find that the symbols can be partitioned into two levels of significance. About half the symbols are the names of procedures and variables having global significance. If the names of these symbols are chosen intelligently and carefully, the program's readability improves drastically. (They usually aren't chosen well, most often because the assembler restricts symbols to 6 letters, or because the programmer's habits are influenced by such assemblers.) The other half of the symbols in a program have a much lower, local significance. They are only place-markers used to implement small loops and local branching (e.g., "skip the next 2 instructions if the Z-flag is set"). Assigning full- blown names to these labels reduces the readability of your program in two ways: First, it is harder to recognize local jumps for what they are-- they are usually the assembly- language equivalent of high-level language constructs like IF statements and WHILE-loops. Second, it is harder to follow the global, significant symbols because they are buried in a sea of the place-marker symbols in the symbol table. By assigning extremely short local names (typically L0 through L9) to the place-markers in your program, you eliminate the clutter of symbols. The are also other advantages to be reaped from this practice: a. Since the complete place-marker symbol is only two characters long, you can indent your program code only two spaces, and still be able to quickly spot the destinations to short jumps. This gives you more room for literate comments on the instruction lines, which in turn reduces the number of full-line comments needed. The whole program becomes less cluttered and more readable. b. The only global labels within your program's body will be procedure names. The XREF symbol-table program keys on this, by collecting the name of the last global symbol defined for each global symbol referenced. Thus the XREF cross-reference is at the procedure level, making an A86 XREF much more useful than other cross reference listings, which give either line numbers (too precise) or module names (not precise enough). c. Your re-use of a local symbol name causes A86's internal storage area for that name to be re-used as well. This effectively doubles your symbol-table capacity. 2. DUPLICATE DEFINITIONS. Not to be confused with local symbols, the duplicate definition feature allows you to redefine the same global symbol, as long as each succeeding definition has the same value as the first. This has two uses: First, it eases modular program development. For example, if two independently-developed source files both use the symbol ESC to stand for the ASCII code for ESCAPE, they can both contain the declaration ESC EQU 01B, with no problems if they are combined into the same program. The second use for this feature is assertion-checking. Your deliberate redeclaration of a symbol name is an assertion that the value of the symbol has not changed; and you want the assembler to issue you an error message if it has changed. Example: suppose you have declared a table of options in your DATA segment; and you have another table of initial values for those options in your CODE segment. If you come back months later and add an option to your tables, you want to be reminded to update both tables in the same way. You should declare your tables as follows: DATA SEGMENT OPTIONS: . . OPT_COUNT EQU $-OPTIONS ; OPT_COUNT is the size of the table CODE SEGMENT OPT_INITS: . . OPT_COUNT EQU $-OPT_INITS ; second OPT_COUNT had better be the same! Note that you can do assertion-checking in MASM, but that it requires extra language symbols, .ERRE and .ERRNZ, that you cannot possibly remember without looking in the manual. Also, the above example is much more clean and unobtrusive in A86 than it is in MASM. 3. CONDITIONAL RETURNS. A86 allows the operand to a conditional jump instruction to be one of the three RET instructions RET, RETF, or IRET. The assembler will find a nearby return- instruction of the indicated flavor, and use that as the target for the conditional jump. For example, JZ RET is the replacement for the 8080's RZ return-if-zero instruction. With MASM, you have to find the nearby instruction yourself, attach a label to it, and use that label. Note that it does not suffice to attach a label to a single RET instruction and use that label throughout the program: the range of conditional jumps is only 128 bytes in either direction. In addition to the obvious advantage in convenience, there is also an advantage in program readability. If you want to make your programs both more readable and more modular, look for blocks of code containing several jumps to the same local- label location. If you find such a block, break it off into a separate procedure, ending at the local label being jumped to. The unconditional jumps to the local label become RET instructions; the conditional jumps become Jcond RET instructions. 4. OTHER SOURCE-SIMPLIFICATION FEATURES. A86 has other code- simplification features that make coding more convenient and programs more readable. These are illustrated by the following A86 code fragments, alongside the longer equivalents that MASM forces on you: A86 code longer equivalent -------- ----------------- MOV AX,BX,VALUE MOV BX,VALUE MOV AX,BX IF E MOV AL,BL JNE BEYOND_MOVE MOV AL,BL BEYOND_MOVE: PUSH AX,BX,CX,DX PUSH AX PUSH BX PUSH CX PUSH DX INC SI,2 INC SI INC SI TEST BX TEST BX,BX INC B[BX] INC DS:BYTE PTR [BX] MOV FORWARD_VAR W,0 MOV WORD PTR FORWARD_VAR,0